Large Language Models with Mark III Systems
After ChatGPT’s release in 2022, the discussion of Generative AI in industry use has become a widespread topic. The foundation of these programs lie in Large Language Models (LLMs), which contain enormous amounts of information used to generate the best sequence of words for a given prompt. These models often scale in billions of parameters and can be trained on large data sets to produce specific and unique responses.
Before diving deep into the training process, Buchanan introduced webinar attendees to LLMs as a whole, referencing a paper by Samuel R. Bowman. These models are based on the transformer network architecture, which originated from 'Attention is All You Need', a 2017 paper by Ashish Vaswani and his research team.
According to Buchanan, there are two important steps to training models: preprocessing and finetuning. In preprocessing, researchers can use tokenization to convert word prompts (input) into a numeric format for the model to take in. When using LLMs, Buchanan advises researchers to utilize pre-trained models that are already equipped with existing language data and knowledge. Then, the model can be finetuned according to a custom dataset, reducing the time used collecting and inserting data sets than if it were built by scratch.
To demonstrate how an LLM is trained, Buchanan spent the second half of the webinar taking participants through the process, featuring model Falcon 7-B as an example in a Jupyter Notebook lab. Applying the QLoRA finetuning approach, a training strategy that allows the model to be loaded in full precision when using smaller GPUs, Buchanan first shows the pre-processing phase (creating/applying a tokenizer and inputting inference examples). After, she uses a MedText data set to finetune the model, analyzing output, defining arguments for training, and priming the model to produce better quality answers.
This workshop was the last session for the fall education series. For more information on LLMs and other current AI/ML trends, those interested can access replays of past sessions here.